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Abstract.  The  EISPACK  routine  TINVIT  is  an  implementation  of  inverse  iteration  for  computing 
c  gen  vectors  of  real  symmetric  tridiagonal  matrices.  Experiments  have  shown  that  the  eigenvectors 
computed  with  TINVIT  are  numerically  less  accurate  than  those  from  implementations  of  Cuppen’s 
divide  and  conquer  method  (TREEQL)  and  of  the  QL  method  (TQL2).  The  loss  of  accuracy  can 
be  attributed  to  TINVIT’s  choice  of  starting  vectors  and  to  its  iteration  stopping  criterion. 

In  this  paper,  we  introduce  a  new  implementation  of  TINVIT  that  computes  each  eigenvector 
from  a  different  random  starting  vector  and  performs  an  additional  iteration  after  the  stopping 
criterion  is  satisfied.  We  present  a  statistical  analysis  and  the  results  of  numerical  experiments 
with  matrices  of  order  up  to  525  to  show  that  the  numerical  accuracy  of  this  new  implementation 
is  competitive  with  that  of  the  implementations  of  the  divide  and  conquer  and  QL  methods. 
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1  Introduction 


It  is  our  goad  to  determine  an  accurate  method  for  computing  all  eigenvalues 
and  eigenvectors  of  real  symmetric  tridiagonal  matrices  that  is  efficient  both 
sequentially  and  in  parallel.  Experimental  results  in  [13,  14]  indicate  that  bi¬ 
section  with  inverse  iteration  is  generally  the  fastest  and  most  efficient  paral¬ 
lel  eigensolver  on  a  distributed-memory  hypercube  multiprocessor  such  as  the 
INTEL  iPSC  and  that  it  is  also  the  fastest  sequential  method  for  many  prob¬ 
lems.  The  compt’*"d  eigendecompositions,  however,  are  less  accurate  than  those 
computed  by  existing  implementations  of  Cuppen’s  divide  and  conquer  method 
(TREEQL)  [4,  10]  or  the  QL  method  (TQL2)  [2,  19].  The  tested  implemen¬ 
tations  of  bisection  are  based  on  the  EISPACK  routine  BISECT  [19]  which 
produces  eigenvalues  to  high  absolute  accuracy  [19]  and  which  with  minor  mod¬ 
ification  would  produce  eigenvalues  to  high  relative  accuracy  [0],  The  loss  of 
accuracy  can  thus  be  attributed  to  the  tested  implementation  of  inverse  iter¬ 
ation  (EISPACK’s  TINVIT).  In  this  paper,  we  identify  the  factors  influencing 
the  accuracy  of  inverse  iteration  and  present  a  new  implementation  of  inverse 
iteration  that  computes  eigenvectors  to  high  absolute  accuracy. 

Suppose  that  T  is  an  n  x  n  real  symmetric  tridiagonal  matrix  with  eigende- 
composition 


where  the  diagonal  elements  A  are  the  eigenvalues  of  T  and  the  column  Uj 
of  the  orthogonal  matrix  U  is  the  eigenvector  associated  with  eigenvalue  A,. 
Given  an  accurately  computed  eigenvalue  Ay,  inverse  iteration  computes  the 
corresponding  eigenvector  uy  by  performing  the  power  method  with  the  shifted 
matrix  ( T  —  Ay/)-1: 
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Algorithm  1.1  (Inverse  Iteration) 

Select  a  starting  vector  y(°\ 

For  k  =  1,2,...  until  convergence : 

Solve  ( T  —  A as  for  r/fcb 

uj  =  y(k)/\\  y(t)  lb 

Representing  the  starting  vector  j/°)  as  a  linear  combination  of  the  eigen¬ 
vectors  =  53”  ,  i HUi  gives  for  the  first  iterate  j/1)  =  53?=i  7~aV~ u»-  If  the 
contribution  rjj  of  uy  in  j/°)  is  not  too  small  and  if  Ay  is  close  to  Ay ,  the  contri¬ 
bution  rjj/(Xj  —  Ay)  of  Uj  in  the  next  iterate  j/1)  is  large,  and  j/1)  is  a  better 
approximation  to  uy  than  is  j/°)  [21],  p.321.  Likewise,  in  the  next  iteration,  the 
contribution  of  uy  in  y(2)  increases  to  r?y/(Ay  —  Ay)2  and  so  on  for  subsequent 
iterations.  Thus,  the  iterates  t/4)  usually  converge  to  uy  in  only  a  few  iterations. 

If  all  eigenvalues  are  well-separated  and  if  the  starting  vector  in  each  eigen¬ 
vector  computation  contains  a  large  enough  component  p,-,  inverse  iteration 
using  shifts  Aj , . . . ,  An  in  turn  computes  an  orthogonal  set  of  eigenvectors.  How¬ 
ever,  if  some  eigenvalues  are  close  together,  inverse  iteration  as  outlined  in  Al¬ 
gorithm  1.1  produces  eigenvectors  that  are  not  orthogor  '.  in  additional  step 
to  orthogonalize  iterates  associated  with  close  eigenvalues  f '  n  necessary. 

This  discussion  of  the  inverse  iteration  algorithm  shows  that  if  the  eigenval¬ 
ues  A,-  are  determined  to  working  precision  (which  is  true  for  BISECT)  and  if 
the  linear  system  solution  and  the  orthogonalization  of  iterates  corresponding 
to  close  eigenvalues  are  carried  out  accurately  (which  is  true  for  TINVIT),  the 
overall  accuracy  of  inverse  iteration  is  determined  hy: 
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1.  the  choice  of  starting  vector, 

2.  the  reorthogonalization  criterion,  and 

3.  the  iteration  stopping  criterion. 

We  will  examine  the  EISPACK  routine  TINVIT  with  regard  to,  each  factor  in 
turn  and  gradually  improve  its  accuracy  to  that  of  the  implementations  of  the 
QL  and  Cuppen’s  divide  and  conquer  methods.  The  numerical  experiments 
involve  matrix  orders  up  to  n  =  525,  and  the  conclusions  drawn  from  them 
therefore  may  not  apply  to  much  larger  matrix  orders. 

This  paper  is  organized  as  follows.  A  simple  perturbation  result  is  presented 
in  Section  2  to  define  measures  of  high  absolute  accuracy  for  the  computed 
eigenvalue  decomposition.  The  EISPACK  implementation  TINVIT  is  described 
in  Section  3,  and  its  lack  of  numerical  accuracy  explored  in  Section  4.  A  new 
implementation  of  inverse  iteration  based  on  the  experiments  in  Section  4  is 
presented  in  Section  5.  The  numerical  accuracy  of  this  improved  implementation 
is  compared  to  implementations  of  Cuppen’s  divide  and  conquer  method  and 
of  the  QL  method  in  Section  6.  The  use  of  random  starting  vectors  in  the 
new  implementation  of  inverse  iteration  is  justified  by  a  statistical  analysis  in 
Section  7. 

2  The  Computed  Eigendecomposition 

This  section  shows  that  the  computed  eigendecomposition  has  high  absolute 
accuracy  if  its  residual  and  the  deviation  of  the  eigenvectors  from  orthogonality 
sue  small. 

Suppose  that  the  diagonal  elements  of  A  are  the  eigenvalues  of  T  =  U AUT 
in  descending  order 

^1  ^  ...  ^ 
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and  that  the  column  u;  of  the  orthogonal  matrix  U  is  the  eigenvector  associated 
with  A*  satisfying  ||  u,  ||2  =  1.  The  spectral  radius  of  T  is  denoted  by 

|A|mar  —  max{|Ax|,  |An|}. 

It  is  further  assumed  throughout  the  paper  that  the  matrix  T  is  unreduced, 
that  is,  that  none  of  the  immediate  sub-  or  superdiagonal  elements  of  T  is 
zero.  Otherwise,  the  matrix  would  consist  of  a  direct  product  of  disjoint,  lower 
order  matrices  whose  eigendecompositions  can  be  computed  independently  [21], 
p.315.  Although  an  unreduced  tridiagonal  matrix  has  distinct  eigenvalues  in 
exact  arithmetic,  it  may  still  have  computationally  coincident  ones  in  finite 
precision. 

Assume  that  the  computed  eigenvalues  satisfy  the  same  order  as  the  corre¬ 
sponding  exact  eigenvalues,  i.e., 

Ax  >  . .  •  An. 

The  accuracy  of  the  computed  eigendecomposition  U A UT  of  T  is  then  deter¬ 
mined  by  the  largest  residual  error  72  of  any  computed  eigenpair  and  by  the 
deviation  from  orthogonality  O  of  the  computed  eigenvectors: 

H  =  —  max  ]|  Tut  -  A<«,  ||2,  O  =  ||  UTU  -  I  U*. 

|  A]max  1^'^n 

The  particular  norms  for  72  and  O  were  chosen  because  they  are  convenient  to 
analyze  and  to  compute.  The  outcome  of  the  numerical  experiments  in  the  later 
sections  does  not  change  if  the  matrix  norm  O  is  replaced  by  the  vector  norm 
maxi<,<n  ||  ( UTU  —  7)e,-  ||oo.  Our  analysis  is  restricted  to  the  above  norm-based 
criteria;  other  quality  measures  that  are  applicable  when  T  is  known  to  very 
high  accuracy  are  discussed  in  [1,  5,  6,  7]. 
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Lemma  2.1  IfV  is  a  square  matrix  and 

E  =  VtV-I ,  E  =  VVT  -  I 

then  l|  E  M,  =  I!  EHj. 

Proof:  Let  V  =  YT.Xt  be  the  singular  value  decomposition  of  V,  then 
VTV  -  I  =  XE.2Xt  -  I  =  XZ2XT  -  XXT  =  X(E2  -  I)XT . 
Similarly,  VVT  - 1  =  Y(E2-I)YT.  Because  X  and  Y  are  orthogonal  matrices, 
II  E  lb  =  II  VTV  -  1 1|2  =  |j  V'V'7'  -  / 1|3  =  II  E  ||a. 

■ 

Theorem  2.1  below  shows  that  the  computed  eigendecomposition  U k(fT 
is  the  exact  eigendecomposition  of  a  matrix  T  +  E  close  to  T  if  residuals  and 
deviation  from  orthogonality  are  small.  The  error  matrix  E  is  in  general  neither 
symmetric  nor  tridiagonal. 

Theorem  2.1  Let  U XUT  be  the  computed  eigendecomposition  of  a  symmetric 
tridiagonal  matrix  T  and  let 

TZ  =  -J—  max  ||  Tin  -  A,*  ||2t  O  =  ||  UTU  -  I  U*,. 

|A|m«,  i<*<" 

Ifn<ei  and  O  <  f2,  then  there  exists  a  matrix  E  such  that 
T  +  E  =  U A UT ,  ||  E  ||2  <  \/n  ^|A|maie2  +  |A|morfi  \Jl  4-  y/ne^j  , 

where  |A|mar  =  max{|Av|,  |An|}. 

Proof:  Let 

Ei  = -J~{TU -U\),  E2  =  UtU-I,  E2  =  UUt~I. 

|A|  max 
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Because  for  any  n  x  n  matrix  A,  ||  A  JJ2  <  v/^maXi<«n  ||  Aei  H2  [11],  where  e< 
is  the  ith  canonical  vector, 

II  Ei  ||2  <  Vn  max  ||  E^i  ||2  =  y/nTZ  <  x/nei, 

l<i<n 

II  Ej  ||a  <  \/n  ||  Ei  1 1 00  =  VnO  <  y/nci. 

Lemma  2.1  is  now  used  to  hound  ||  UT  JI2: 

v^2  >  ||  El  ||2  =  II  El  II2  =  II  UUT  -I  lla  >  II  UUT  ||,  -  ||  1 1|,  =  ||^|||  -  1, 

where  the  last  equality  follows  from  ||  >iTA  H2  =  H^llj-  Therefore,  || H2  < 
1  +  y/n(l- 

From  the  definition  of  E\, 

TUUt  -  UAUT  =  |A|m0,£r1{/r> 

so  that 

UAUt  =  TUUt  -  \yma:EiUT  =  T(I  +  El)  -  \\\ma*EiUT  =  T+E, 
where  E  —  TEi  —  |A|m<i*£aUT,  and 

II  ^  lb  <  |A|mo*  ||  Ei  ||j  +  |A|mox||  Ei  ||2  ||  UT  ||2 

<  v/n^|A|  max*  2  +  |A|  max 

■ 

Under  the  assumptions 

||  TU  -UA  ||2  <  fi,  ||  UTU  -  I  ||2  <  ea, 

the  error  matrix  is  bounded  above  by 

II  E  H2  <  |A|max€2  +  fi"v/l  +  €1, 

which  is  independent  of  the  matrix  order. 
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3  The  EISPACK  Routine  TINVIT 


The  steps  for  computing  all  eigenvectors  of  an  unreduced  symmetric  tridiagonal 
matrix  T  by  inverse  iteration  Eire  given  in  Algorithm  3.1: 

Algorithm  3.1  (Basic  Implementation  of  Inverse  Iteration) 

For  j  =  n,  n  -  1, . . 1: 

1.  Choose  a  starting  vector  yj. 

2.  Solve  (T  -  Aj  I)zj  =  yj  for  Zj . 

3.  If  the  reorthogonalization  criterion  is  satisfied, 

orthogonalize  Zj  against  iterates  associated  with  computed  eigenvalues 
close  to  \j. 

4-  If  the  stopping  criterion  is  not  satisfied, 
set  yj  =  Zj  and  go  to  step  2. 

5.  The  computed  eigenvector  is  iij  =  Zj /||  Zj  ||2 - 


As  suggested  in  [20],  p.143  and  [21],  p.329,  the  EISPACK  implementation 
TINVIT  performs  the  linear  system  solution  in  step  2  by  Gaussian  Elimination 
with  partial  pivoting  and  the  orthogonalizations  in  step  3  by  the  modified  Gram- 
Schmidt  algorithm.  Because  TINVIT  yields  iess  accurate  eigenvectors  than  do 
existing  implementations  of  the  QL  method  (TQL2)  [2,  19]  or  Cuppen  s  divide 
and  conquer  method  (TREEQL)  [4,  10],  the  loss  in  accuracy  must  be  due  to 
one  or  more  of  the  following  three  factors:  the  choice  of  starting  vector,  the 
reorthogonalization  criterion,  and  the  stopping  criterion.  TINVIT  deals  with 
these  issues  as  follows. 
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3.1  The  Starting  Vector 

As  argued  in  Section  1,  a  good  starting  vector  y j  has  a  large  enough  contribution 
of  an  eigenvector  associated  with  the  current  eigenvalue  A^  to  yield  an  iterate 
with  dominant  components  in  the  eigenspace  associated  with  A;. 

Without  advance  knowledge  of  the  eigenvectors,  however,  it  is  difficult  to 
ensure  a  high  quality  starting  vector.  For  instance,  the  canonical  basis  vectors 
ei  and  en  should  not  be  used  as  starting  vectors  because  they  are  often  nearly 
orthogonal  to  some  eigenvectors  of  a  symmetric  tridiagonal  matrix  T  [20],  p.147. 
The  vector  of  all  ones  is  also  a  poor  choice  as  it  is  orthogonal  to  half  of  the 
eigenvectors  of  a  symmetric  tridiagonal  Toeplitz  matrix  [12]. 

Analytic  determination  of  a  good  starting  vector  is  complicated  by  the  role 
of  roundoff  error  in  inverse  iteration.  As  shown  in  [18.  20,  21,  22],  one  or 
two  iterations  in  finite  precision  arithmetic  are  generally  sufficient  to  produce  a 
significant  iterate  component  in  the  correct  direction  unless  the  starting  vector 
is  exactly  orthogonal  to  that  direction. 

In  agreement  with  [20],  p.147,  TINVIT  avoids  explicit  formation  of  the 
starting  vector  yj  as  follows.  The  tridiagonal  system  (T  —  A jl)zj  =  yj  is 
solved  by  using  Gaussian  Elimination  with  partial  pivoting  to  factor  the  matrix 
T  —  \j I  =  LjUj.  (Throughout  this  paper,  we  disregard  the  permutation  matrix 
for  simplicity).  The  vector  yj  is  chosen  so  that  the  result  of  the  forward  substi¬ 
tution  equals  e,  the  vector  of  all  ones.  That  is,  the  computation  —  yj  need 
never  be  carried  out,  and  the  solution  of  the  first  linear  system  in  each  eigen¬ 
vector  computation  amounts  to  solving  only  the  second  of  the  two  triangular 
systems  UjZj  =  e. 

When  computationally  coincident  eigenvalues  (i.e.,  eigenvalues  that  are  iden¬ 
tical  to  working  precision)  are  used  as  shifts,  their  iterates  converge  to  a  single 
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eigenvector  and  fail  to  span  the  whole  eigenspace.  However,  these  iterates  are 
very  sensitive  to  the  value  of  A;  [21],  p.329.  Wilkinson  suggests  that  the  com¬ 
putationally  coincident  eigenvalues  be  slightly  perturbed  so  as  to  make  them 
distinct  and  that  inverse  iteration  be  used  with  the  perturbed  eigenvalues  to 
produce  iterates  that  are  linearly  independent.  The  increased  distance  of  A^+j, 
. . .,  A1+t  from  A^  should  affect  only  the  speed  of  convergence  and  not  the  accu¬ 
racy  of  inverse  iteration  [21],  p.329. 

TINVIT  replaces  computationally  coincident  eigenvalues  A,-  =  A,_i  =  ■  •  •  = 
Ai_*  by 

A<  <  Aj  +  <mI|T||u  <  •  •  •  <  A i  +(k  -  l)e.vf||r||fl, 
where  e.v/  is  machine  epsilon,  and 

II  T  Hn  =  max  {|a,|  +  |/?;|} 

K;<n 

for  a  matrix  T  with  diagonal  elements  Qrj and  off-diagonal  elements 

07 . 0n,  and  =  0.  An  unreduced  tridiagonal  matrix  T  satisfies  ||  T  ||H  < 

Halloo- 

3.2  The  Reorthogonalization  Criterion 

The  above  strategy  for  perturbing  computationally  coincident  eigenvalues  is  in¬ 
tended  to  produce  computed  eigenvectors  that  are  linearly  independent.  To 
assure  orthogonal  computed  eigenvectors,  the  iterates  associated  with  close 
eigenvalues  are  reorthogonalized  against  each  other.  In  TINVIT,  two  adjacent 
eigenvalues  A j  and  AJ  +  i  are  considered  close  if 

A;  -AJ+1  <  10-3||  T  ||R. 

The  process  of  reorthogonalizing  the  iterates  is  different  in  sequential  and  par¬ 
allel  implementations  and  proceeds  as  follows. 


9 


In  a  sequential  implementation  the  eigenvectors  are  computed  successively, 
according  to  the  ascending  order  of  the  eigenvalues.  That  is,  at  the  time  of 
computation  of  itj,  the  computation  of  the  eigenvectors  iij+i,  . . un  has  al¬ 
ready  been  completed.  If  a  computed  eigenvalue  A j  is  close  to  the  computed 
eigenvalue  AJ  +  i  then  the  iterate  Zj  is  reorthogonalized  against  Uj+i  and  against 
all  eigenvectors,  against  which  xij+i  was  orthogonalized. 

In  the  parallel  implementation  of  [13],  all  iterates  z<  associated  with  a  set  of 
close  eigenvalues  are  computed  simultaneously  in  lock  step.  If  A;-  is  close  to  A;+i 
then  the  iterate  Zj  is  orthogonalized  "gainst  zJ+i  and  against  all  iterates,  against 
which  Zj+i  was  orthogonalized.  Although  the  sequential  and  parallel  implemen¬ 
tations  could  have  been  based  on  identical  algorithms,  the  orthogonalization  of 
Zj  against  the  intermediate  iterates  z<  is  used  in  the  parallel  implementation  to 
allow  pipelined  reorthogonalization  of  eigenvectors  [13]. 

In  the  outline  of  TINVIT  below,  the  data  structure  CHJSTER(i)  contains 
the  indices  i  +  1, . . . ,  i  +  k  of  all  those  vectors,  against  which  z<  must  be  orthog- 
oualized. 

3.3  The  Stopping  Criterion 

In  [20],  p.145,  Wilkinson  shows  that  if  am  iterate  Zj  has  a  large  norm  after 
reorthogonalization  but  before  normalization,  the  eigenpair  (Aj,z;)  has  a  small 
residual  error.  Specifically,  the  large  iterate  norm  e.\r||z;||2  =  fl(n-1/2)  leads  to 
the  small  residual  ||(T  —  \jl)zj\\i  =  0{e\tn~1^2)  [20],  p.145.  Furthermore,  the 
large  norm  of  Zj  (after  reorthogonalization)  indicates  that  the  iterates  associated 
with  A j  +  i, . . . ,  A„  were  linearly  independent  (before  reorthogonalization)  so  that 
the  computed  eigenvector  u j  =  Zj/\\  Zj  jja  is  orthogonal  to  u;+i,...,un.  (The 
connection  between  larga  iterate  norm  and  successful  orthogonalization  by  the 
modified  Gram-Schmidt  procedure  is  demonstrated  in  Section  4.1). 
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From  the  perturbation  result  in  Section  2  vve  can  then  conclude  that  (A; ,  zj) 
is  an  eigenpair  of  a  matrix  close  to  T  and  hence  that  Zj  is  an  accurate  eigenvector. 
Because  INIa  ^  II  zj  llooi  the  two-norm  can  be  replaced  by  the  cheaper  infinity 
norm  for  convergence  testing.  Thus,  if  ||  yj  |joo  =  1  and  cm\\  zj  Hoq  >  1,  then 
zj  is  a  good  eigenvector  approximation.  The  difficulty  lies  in  determining  just 
how  large  ||  Zj  ||co  should  be.  TINVIT  stops  iteration  if  cm\\  z  ||oo  >  1  (ignoring 
scaling  factors). 

3.4  Implementation  of  Inverse  Iteration 

A  sketch  of  the  EISPACK  routine  TINVIT  is  given  as  Algorithm  3.2  below.  Nu¬ 
merical  details  such  as  scaling  factors  used  to  prevent  overflow  are  not  included. 
The  computed  eigenvalues  are  in  descending  order  Ai  >  ...  >  An,  and  cm  is 
machine  epsilon. 
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Algorithm  3.2  (Outline  of  TINVIT) 

For  j  =  n,  n  —  1 . . . ,  1 

0.  Perturb  computationally  coincident  eigenvalues: 

if  j  <  n  and  Xj  —  Ay+i  <  0,  then  replace  X  j  with  AJ+i  +  €m  ||  T  ||H. 

1.  Initialize  the  set  of  eigenvalues  close  to  Xj :  CLUSTER(j)  =  0. 

If  j  <  n  and  Xj  -  Xj+i  <  10-3||  T  |(H,  then 

CLUSTER {j)  =  CLUSTERS  +  1)  (J  O'  +  *}• 

2.  Initialize  the  iterate  norm  <Tj  =  0. 

3.  Loop  until  e\[<Tj  >  1  ('error  exit  after  5  iterations) 

3.  a  Factor  ( T  —  Xj  I)  =  L}Uj. 

3.b  If  this  is  the  first  iteration,  solve  U:Zj  —  e, 
otherwise  solve  LjUjZj  =  . 

S.c  Sequential  implementation: 

Reorihogonalize  Zj  against  all  u*  with  i  G  CLUSTER(j). 
Parallel  implementation: 

Reorihogonalize  Zj  against  all  Zi  with  i  6  CLUSTER(j). 

3  d  Set  Oj  =  ||  Zj  Hoc  and  yj  =  Zj. 

4-  The  computed  eigenvector  is  iij  =  y;/||  yj  ||2. 
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4  Experimental  Results 

The  experimental  results  in  [14]  sjjow  that  TQL2  or  TREEQL  generally  yield 
residuals  TZ  =  max*  ||  Tui  —  A jtij  ||2  less  than  10-14  for  matrix  orders  n  <  525; 
and  deviations  from  orthogonality  O  =  ||  UTU  —  I  ||oo  less  than  10~14  for  n  =s 
32,  less  than  10“ 13  for  n  ta  100,  and  less  than  10“ 12  for  n  ss  512  (a  similar 
dependence  on  the  matrix  order  occurs  when  the  deviation  from  orthogonality 
is  instead  measured  by  O  =  majq  Jj  (UTU  -  7)e,-  U2).  The  EISPACK  routine 
TSTURM  (a  combination  of  BISECT  and  TINVIT)  yields  respective  residuals 
TZ  less  than  10-14,  less  than  10-13,  and  less  than  10-12  and  orthogonality 
measures  O  less  than  10~12,  and  10-1°  for  matrix  orders  32,  100,  and 

512,  respectively. 

The  numerical  experiments  in  this  section  were  designed  to  determine  which 
features  of  TINVIT  need  to  be  modified  so  that  it  is  at  least  as  accurate  in  prac¬ 
tice  as  the  QL  routine  TQL2  [19]  and  the  divide  and  conquer  routine  TREEQL 
[10].  All  experiments  were  performed  in  double  precision  on  a  single  Sequent 
Symmetry  S81  processor  using  the  Weitek  1167  floating-point  accelerator.  The 
eigenvalues  were  computed  with  the  EISPACK  routine  BISECT  to  working  pre¬ 
cision.  Because  we  tested  only  matrices  up  to  orders  of  n  =  525,  our  conclusions 
may  not  apply  to  much  larger  matrix  orders. 

This  paper  presents  representative  results  selected  from  the  ones  in  [14].  The 
first  test  matrix  [1,2,1]  illustrates  the  case  of  matrices  without  close  eigenvalues. 
The  matrix  [1,2,1]  is  a  symmetric  tridiagonal  Toeplitz  matrix  of  order  n  having 
twos  on  the  diagonal  and  ones  on  the  first  sub-  and  superdiagonals.  Its  exact 
eigenvalues  are  well-separated  and  given  by  [12] 

Xj  =2^1  +  cos  -  J  ,  1  <  j  <  n. 
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For  matrix  orders  n  <  525,  the  computed  eigenvalues  A ,  are  also  well-separated. 

The  second  test  matrix  is  the  ‘glued  Wilkinson  matrix’  W*  and  represents 
one  of  the  most  difficult  test  cases  for  dealing  with  groups  of  close  eigenvalues. 
It  is  constructed  as  follows.  The  ‘Wilkinson  matrix'  of  order  n  =  21  has 
diagonal  elements  10, 9, . . . ,  1, 0, 1, . . 9, 10  and  immediate  off-diagonal  elements 
equal  to  one.  It  possesses  pairs  of  eigenvalues  that  are  very  close  [21],  p.309. 
The  spacing  between  eigenvalues  in  a  pair  decreases  with  increasing  magnitude 
of  the  eigenvalues,  and  the  eigenvalues  in  the  largest  pairs  are  computationally 
coincident  with  regard  to  double  precision.  The  glued  Wilkinson  matrix  W+  of 
order  21  j  is  formed  by  placing  j  copies  of  Wj]  along  the  diagonal  of  the  matrix 
and  setting  off-diagonal  elements  equal  to  10“14  at  the  positions  /?2i>/?42,  •  •  • 
where  the  submatrices  join.  For-^natrix  orders  n  >  200,  W+  has  clusters  of 
eigenvalues  near  the  integers  1, 2, . . (2J  [17]. 

The  conclusions  drawn  from  numerical  experiments  with  these  two  matrix 
types  are  supported  by  tests  on  random  matrices  in  [14]. 

4.1  Starting  Vectors 

In  this  section,  we  examine  the  influence  of  the  starting  vector  on  the  accuracy 
of  inverse  iteration  and  on  the  number  of  iterations  performed.  To  this  end,  we 
use  the  following  vectors  as  starting  vectors  for  the  computation  of  ti; : 

1.  the  ‘correct’  eigenvector  Uj :  this  starting  vector  is  the  eigenvector  Uj  com¬ 
puted  by  inverse  iteration  with  a  random  starting  vector.  Each  starting 
vector  «i,...,«n  has  residuals  72  <  10“ 14  for  all  orders  and  orthogonal¬ 
ities  O  <  10“ 14  for  n  <  42,  O  <  10" 13  for  n  <  105,  and  O  <  10" 12  for 
n  <  525. 
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2.  «„  +  TUj :  this  linear  combination  of  un  and  Uj  is  used  as  starting  vector 
to  compute  itj  for  1  <  j  <  n  —  1,  and  a  random  starting  vector  is  used  to 
compute  un.  When  r  =  0  and  A„_i  >  An,  un  +  riij  is  roughly  orthogonal 
to  the  eigenvectors  associated  with  Ai, . . . ,  An_j.  Increasing  the  value  of  r 
amounts  to  increasingihe  contribution  of  the  desired  eigendirection  in  the 
starting  vector  and  thus  determines  the  minimal  size  of  the  contribution 
that  is  sufficient  for  convergence. 

3.  random  vectors:  these  vectors  have  uniformly  distributed  pseudorandom 
components  between  -1  and  I  generated  with  the  linear  congruential  ran¬ 
dom  number  generator  from  NETLIB.  For  each  matrix  order  n,  a  single 
n  x  n  random  matrix  is  generated.  In  one  set  of  experiments,  we  use 
the  first  column  of  this  matrix  as  the  starting  vector  for  all  eigenvectors. 
In  the  second  set  of  experiments,  we  use  column  j  of  the  matrix  as  the 
starting  vector  for  the  jth  eigenvector. 

4.  the  TINVIT  starting  vector  y;- :  this  starting  vector  is  not  computed  explic¬ 
itly.  Instead  it  is  assumed  to  be  the  right-hand  side  of  the  lower  triangular 
system  L;e  =  yj,  where  e  is  the  vector  of  all  ones,  and  T  —  Xj I  =  LjUj  is 
the  LU  decomposition  (disregarding  the  permutation  matrix). 

For  the  purposes  of  this  section,  TINVIT  was  modified  to  perform  the  same 
number  of  iterations  for  all  eigenvectors.  Iteration  was  continued  until  the 
required  accuracy  was  achieved  but  for  not  more  than  five  iterations.  Compu¬ 
tationally  coincident  eigenvalues  were  perturbed  as  in  step  0  of  Algorithm  3.2 
except  when  different  starting  vectors  were  used  for  each  shift.  For  different  ran¬ 
dom  starting  vectors,  the  rate  of  convergence  and  accuracy  of  inverse  iteration 
are  preserved  even  if  computationally  coincident  eigenvalues  are  not  perturbed, 
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that  is,  even  if  step  0  of  TINVIT  is  omitted.  The  following  two  sections  distin¬ 
guish  between  the  experimental  results  for  the  cases  of  well-separated  and  close 
eigenvalues. 

4.1.1  Starting  Vectors  for  Matrices  with  Well-Separated  Eigenvalues 

Table  1  shows  the  number  of  iterations  required  by  inverse  iteration  to  compute 
the  eigenvectors  to  the  same  accuracy  as  TQL2  or  TREEQL  for  each  type  of 
starting  vector. 

High  accuracy  is  achieved  in  one  iteration  only  when  accurately  computed 
eigenvectors  Uj  are  the  starting  vectors.  More  than  two  iterations  are  needed 
only  for  larger  n  and  only  when  the  starting  vector  is  orthogonal  or  nearly 
orthogonal  to  the  computed  eigenvector  (r  <  10~16).  All  other  starting  vectors 
require  two  iterations. 

Thus,  for  matrices  [1,2,1]  of  order  n  <  512,  a  starting  vector  component  rjj 
of  magnitude  10~8  in  the  desired  direction  Uj  suffices  for  rapid  convergence, 
i.e.,  two  iterations.  Performing  more  iterations  than  listed  in  Table  1  does  not 
significantly  change  the  accuracy.  These  results  are  supported  by  numerical 
experiments  on  random  matrices  with  minimal  eigenvalue  spacing  of  lO-4  [14], 
In  summary,  when  all  eigenvalues  are  well-separated  the  performance  of  in¬ 
verse  iteration  does  not  strongly  depend  on  the  starting  vector:  random  starting 
vectors  and  the  TINVIT  starting  vector  provide  a  large  enough  component  in 
the  desired  direction  for  fast  convergence. 

4.1.2  Starting  Vectors  for  Matrices  with  Groups  of  Close  Eigenval¬ 
ues 

Table  2  shows  the  number  of  iterations  for  the  glued  Wilkinson  matrix  W+  with 
n  =  42,  105,  and  525  for  the  different  starting  vectors.  As  for  matrix  [1,2,1], 
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starting 

vector 

n  =  32 
number  of 
iterations  for 
K  <  10"14 

O  <  10“14 

n  =  100 
number  of 
iterations  for 

n  <  io-14 

O  <  10-13 

n  =  512 
number  of 
iterations  for 
■R  <  10"14 

O  <  10"12 

1 

1 

1 

«n 

2 

2 

4 

tk„  +  10-16U; 

2 

2 

3 

u„  +  10-%- 

2 

2 

2 

un  +  10  ~2Uj 

2 

2 

2 

same  random 

2 

2 

2 

different  random 

2 

2 

2 

TINVIT 

2 

2 

2 

Table  1:  Number  of  inverse  iterations  to  compute  eigenvector  iij  for  matrix 
[1,2,1]  of  order  n.  The  same  number  of  iterations  is  performed  for  each  uj. 
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accurate  eigenvectors  are  produced  in  one  iteration  only  when  the  starting  vector 
is  the  eigenvector.  Two  iterations  suffice  when  the  starting  vector  has  a  correct 
component  of  size  at  least  10~8  or  when  a  different  random  starting  vector  is 
used  for  each  eigenvector.  The  remaining  starting  vectors  require  more  than  two 
iterations.  For  n  =  525,  inverse  iteration  does  not  converge  in  five  iterations 
when  the  iterations  are  started  with  tin,  u„  +  10-16tiy  or  with  the  TINVIT 
starting  vector. 

Table  3  illustrates  the  connection  between  the  convergence  rate  of  the  iterates 
and  their  linear  dependence  for  different  types  of  starting  vectors  and  the  matrix 
W*.  The  numbers  in  Table  3  were  obtained  as  follows.  The  iterates  zy,  1  < 
j  <  n,  before  the  reorthogonalization  step  3.c  in  the  first  iteration  of  TINVIT 
compose  the  columns  of  an  fl  X  n  matrix.  The  smallest  singular  value  of  this 
matrix  is  listed  in  the  first  column  of  numbers,  and  the  smallest  norm  try  attained 
by  the  Zj ,  I  <  j  <  n,  after  reorthogonalizing  in  step  3.c  is  listed  in  the  second 
column  of  numbers.  The  same  information  is  given  for  zy  in  the  second  iteration 
of  TINVIT  in  the  last  two  columns.  These  data  show  that  except  in  the  case 
of  different  random  starting  vectors,  the  iterates  after  the  first  iteration  are 
linearly  dependent.  Thus,  the  modified  Gram-Schmidt  procedure  breaks  down 
and  produces  vectors  that  are  almost  zero.  Likewise,  the  second  iteration  fails 
to  produce  linearly  independent  iterates  for  all  but  the  different  random  starting 
vectors.  (The  singular  value  for  the  matrix  of  iterates  from  the  same  random 
starting  vector  is  so  small,  10-18,  that  the  iterates  can  be  considered  numerically 
linearly  dependent). 

While  the  TINVIT  starting  vectors  are  difficult  to  analyze,  the  other  choices 
suggest  a  possible  correlation  between  linearly  dependent  starting  vectors  and 
iterates:  linearly  dependent  starting  vectors  lead  to  linearly  dependent  iterates 
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starting 

vector 

n  =  42 
number  of 
iterations  for 

n  <  nr14 

O  <  10- : 14 

n  =  105 
number  of 
iterations  for 

n  <  io-14 
o  <  10“13 

n  =  525 
number  of 

iterations  for 

n  <  10~14 

O  <  IO"12 

1 

i 

1 

«n 

3 

3 

>  5 

3 

3 

>  5 

u„  +  10_aUj 

2 

2 

2 

u„  + 10~2u j 

2 

2 

2 

same  random 

2 

3 

3 

different  random 

2 

2 

2 

TINVIT 

2 

3 

>  5 

Table  2:  Number  of  inverse  iterations  to  compute  eigenvector  iij  for  the  glued 
Wilkinson  matrix  W+  of  order  n.  The  same  number  of  iterations  is  performed 
for  each  Uj . 
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starting  vector 

smallest 
singular 
value  of 
first 
iterates 

minimal 
iterate  norm 
min,-  ||  Zj  Hoc- 
after 

one  iteration 

smallest 
singular 
value  of 
second 
iterates 

minimal 
iterate  norm 
min i  ||  zj  1 1 oo 
after 

two  iterations 

0 

0 

0 

1.24<f  —  13 

same  random  vector 

0 

4.69d-  12 

IQ-18 

7.04d-  04 

different  random  vector 

0.02 

4.94d  —  04 

0.08 

>  1.00 

TINVIT 

0 

4.94d  —  12 

0 

1.06d  —  12 

Table  3:  Singular  values  of  the  matrix  of  iterates  and  smallest  iterate  norm 
for  the  glued  Wilkinson  matrix  W*  of  order  n  =  525  after  one  and  after  two 
iterations  of  TINVIT. 
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matrix  order 

smallest  singular  value 

42 

.0362 

100 

.0341 

105 

.0198 

512 

l.d-229 

525 

.0128 

Table  4:  The  smallest  singular  value  for  a  matrix  of  different  random  starting 
vectors. 

in  the  case  of  computationally  coincident  eigenvalues.  Table  4  shows  that  the 
smallest  singular  value  for  a  matrix  composed  of  n  different  random  starting 
vectors  is  much  larger  than  zero;  the  only  exception  is  n  =  512  where  the  512th 
column  is  linearly  dependent  on  the  first  479.  Because  none  of  the  test  matrices 
has  a  group  of  eigenvalues  including  both  the  479th  and  the  512th  eigenvalues, 
inverse  iteration  starts  out  with  linearly  independent  random  starting  vectors 
for  all  iterates  associated  with  the  same  group  of  close  eigenvalues. 

In  summary,  when  a  different  random  starting  vector  is  used  to  compute  each 
eigenvector  of  the  glued  Wilkinson  matrix,  both  the  starting  vectors  and  the 
iterates  are  highly  likely  to  be  linearly  independent.  This  correlation  between 
linear  dependence  of  starting  vectors  and  number  of  iterations  can  be  observed 
to  a  lesser  degree  for  other  large  matrices  with  groups  of  close  eigenvalues  [14]. 

4.2  Stopping  Criterion 

The  experimental  results  in  this  section  show  that  TINVIT’s  choice  of  stopping 
criterion  causes  inverse  iteration  to  stop  before  highest  accuracy  is  attained. 
We  will  examine  an  alternative  that  consistently  improves  the  accuracy.  For 
the  experiments  in  this  section,  TINVIT  was  modified  to  compute  each  eigen¬ 
vector  from  a  different  random  starting  vector  and  to  use  unperturbed  computed 
eigenvalues  as  shifts. 
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matrix 

iteration 

minimal 
iterate  norm 

min,'  II  lloo 

maximal 

residual 

TZ 

deviation  from 
orthogonality 

O 

SljiffS 

1 

>  1.00 

3.18  d-  14 

3.05  d-  12 

■H 

2 

>  1.00 

1.58d  —  16 

3.20d  —  14 

w+ 

g 

1 

0.14 

1 .47d  —  13 

1.68d  —  11 

n  =  105 

2 

>  1.00 

8.70d  —  16 

3.85d  —  15 

[1,2,1] 

1 

>  1.00 

1.60d  —  11 

4.62d-  09 

n  =  512 

2 

3.93d  —  16 

1.57d  —  13 

Wf 

1 

— 

4.94d-  04 

3.42d-  09 

6.02d-  07 

n  =  525 

2 

>  1.00 

5.99d  —  15 

1.98d  —  14 

Table  5:  Iterate  norm,  residual,  and  orthogonality  for  matrices  [1,2,1]  and  W+ 
after  one  and  after  two  inverse  iterations.  A  different  random  starting  vector  is 
used  for  each  eigenvector  computation. 
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For  matrices  [1,2,1]  and  W+,  Table  5  shows  how  the  accuracy  of  the  com¬ 
puted  eigendecomposition  depends  on  the  norm  of  the  computed  iterates  (after 
reorthogonalization  in  step  3.c  of  Algorithm  3.2).  The  TINVIT  stopping  crite¬ 
rion  works  correctly  for  both  orders  of  the  glued  Wilkinson  matrix  W+:  unit 
iterate  norm  and  full  accuracy  are  both  attained  on  the  second  iteration.  It  fails, 
however,  on  the  matrix  [1,2,1]  where  ail  iterates  have  greater  than  unit  norm 
but  less  than  full  accuracy  on  the  first  iteration.  The  same  conclusions  can 
be  drawn  from  experiments  with  random  matrices  in  [14].  It  seems,  therefore, 
that  at  least  two  iterations  should  always  be  performed  regardless  of  iterate 
norm  when  different  random  starting  vectors  are  used.  In  other  words,  after 
the  iterate  norm  is  large  enough  and  the  loop  in  step  3  of  TINVIT  is  exited, 
perform  one  more  iteration.  The  additional  iteration  was  already  suggested  in 
[21],  p.324,  but  was  not  implemented  in  TINVIT. 

We  have  not  found  a  simple  correlation  between  size  of  the  iterate  norms 
and  the  number  or  size  of  the  groups  of  close  eigenvalues. 

4.3  Reorthogonalization 

For  the  purposes  of  this  section,  TINVIT  was  modified  to  compute  each  eigen¬ 
vector  from  a  different  random  starting  vector  and  to  perform  one  more  iteration 
after  the  iterate  norm  becomes  large  enough,  t.e.,  one  more  iteration  after  exit¬ 
ing  the  loop  in  step  3  of  TINVIT.  In  the  numerical  experiments  below,  we  vary 
the  distance  at  which  adjacent  eigenvalues  are  considered  to  be  so  close  as  to 
require  orthogonalization  of  the  associated  iterates. 

Table  6  shows  the  residuals  7 Z  and  deviations  from  orthogonality  O  for  ma¬ 
trix  [1,2,1]  of  order  n  =  100  as  the  reorthogonalization  criterion  is  varied  from 
0  to  1010||  T  Ujj  after  one  and  after  two  inverse  iterations.  These  data  con¬ 
firm  that  more  orthogonalization  is  not  a  substitute  for  extra  iterations  because 
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criterion 


number 

one  iteration  j 

two  iterations 

criterion 

of 

vectors 

n 

O 

71 

O 

orthog- 

onalized 

i 

►— » 
o 

o 

H 

sT 

99 

6.09d-13 

6.41d-15 

2.12d-16 

5.75d-15 

(all) 

i 

o 

97 

1.87d-12 

7.40d-15 

2.13d-16 

6.12d-15 

10~2||  T  IU 

33 

7.69d-14 

4.97d-12 

1.67d-16 

1.82d-15 

10-3||  T  ii« 

2 

3.18d-13 

3.05d-12 

1.58d-16 

3.20d-l4 

10~ 5 1|  T  11^ 

1 

2.75d-ll 

1.90d-16 

4.28d-14 

0 

0 

2.28d-13 

2.68d-ll 

1.61d-16 

4.68d-14 

Table  6:  Accuracy  for  matrix  [1,2, 1]  with  different  reorthogonalization  criteria 
when  n  =  100. 

small  residuals  are  not  attained  until  the  second  iteration  even  with  reorthog¬ 
onalization  of  all  eigenvectors.  Table  7  shows  thr  same  situation  for  W+  when 
n  =  105  and  n  =  525,  as  well  as  the  fraction  of  inverse  iteration  time  spent  in 
the  modified  Gram-Schmidt  procedure. 

Increasing  the  reorthogonalization  criterion  beyond  that  of  TINVIT  does  not 
significantly  improve  the  accuracy  for  matrices  [1,2, 1]  and  W+.  It  can,  how¬ 
ever,  substantially  increase  the  computation  time.  With  the  TINVIT  criterion 
10~3||T||/r,  most  of  the  eigenvectors  of  W*  are  reorthogonalized  (80%  when 
n  =  105  and  96%  when  n  =  525),  but  reorthogonalization  occurs  in  many  small 
groups.  In  contrast,  with  the  criterion  1010||!T||/?  all  eigenvectors  are  reorthogo¬ 
nalized  as  one  group,  and  the  cost  rises  markedly  although  the  accuracy  hardly 
changes. 

These  experiments  show  that  the  best  possible  orthogonality  can  generally 
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order 

criterion 

7 Z 

O 

number 

of 

vectors 

time 

for 

MGS 

fraction 

MGS 

time 

n  =  105 

1010||  T  11^ 
(all) 

1.97d-16 

4.57d-15 

104 

30.6 

.67 

10-MI  r  II* 

1.60d-16 

3.14d-15 

88 

27.7 

.10 

10-3||T||H 

8.70d-16 

3.85d-15 

84 

1.4 

.07 

10-5||  T  ||fl 

2.09d-16 

2.52d-13 

78 

1.4 

.05 

0 

1.85d-10 

2.05 

0 

0 

0 

n  =  525 

1010||  T  ||fi 
(all) 

7.58d-15 

1 ,53d- 14 

524 

5807.1 

.98 

io-l|l  r||* 

4.69d-15 

3.51d-14 

523 

5287.6 

.73 

10-3||  T  ||fi 

5.99d-15 

1.98d-14 

504 

338.1 

.15 

10_5||  T  ||* 

3.46d-15 

6.24d-13 

498 

253.1 

.12 

0 

2.04d-16 

_ 

6.38 

_ 

0 

0 

0 

Table  7:  Accuracy  and  computation  time  for  W+  after  two  inverse  iterations 
with  different  reorthogonalization  criteria  when  n  =  100  and  n  =  525.  The 
last  column  shows  the  fraction  of  reorthogonalization  (MGS)  time  in  inverse 
iteration. 


be  attained  only  by  the  time-consuming  process  of  reorthogonalizing  all  eigen¬ 
vectors.  The  accuracy  desired  here,  however,  can  usually  be  achieved  by  means 
of  the  TINVIT  reorthogonalization  criterion  along  with  different  random  start¬ 
ing  vectors  and  the  improved  stopping  criterion  of  Section  4.3. 

5  A  New  Implementation  of  Inverse  Iteration 

The  improvements  to  inverse  iteration  developed  in  Section  4  are  incorporated 
into  the  following  algorithm.  These  changes  are  based  on  experiments  in  Sec¬ 
tion  4  with  matrix  orders  n  <  525  and  may  not  apply  to  much  larger  matrix 
orders. 
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Algorithm  5.1  (Improved  Inverse  Iteration  Algorithm  (III)) 

For  j  =  n,  n—  1, . . 1 

1.  Initialize  the  set  of  eigenvalues  close  to  Xj :  CL  USTER(j)  =  0. 

If  j  <  n  and  A  j  -  A  j  +  1  <  10“3||  T  ||fl,  then 
CLUSTER(j)  =  CLUSTERS  +  1)  U  {i  +  !}• 

2.  Generate  a  random  vector  Xj  with  uniformly  distributed  components 
in  the  interval  [-1,1],  and  form  the  starting  vector  yj  =  Xj/\\  Xj  ||2- 

3.  Initialize  the  iterate  norm  <Tj  =  0. 

4 ■  Loop  until  e\f<Tj  >  1  (error  exit  after  5  iterations). 

4.a  Solve  (T  -  A jl)zj  =  yj. 

4-b  Sequential  implementation: 

Reorihogonalize  Zj  against  all  u,-  with  i  £  CLUSTER(j). 

Parallel  implementation: 

Reorihogonalize  Zj  against  all  Z{  with  i  £  CLUSTER(j). 

4.c  Set  Oj  =  ||  Zj  Hoc  and  yj  =  Zj . 

5.  Repeat  step  4  once. 

6.  The  computed  eigenvector  is  iij  =  y;  /||  y;-  ||2 - 

Tables  8  and  9  compare  the  computation  time  of  the  EISPACK  routine 
TSTURM  with  that  of  the  EISPACK  routine  BISECT  combined  with  algorithm 
III  (B/III).  Because  of  the  additional  iterations  performed,  the  computation 
time  of  III  is  substantially  higher  than  that  of  TINVIT.  For  matrix  [1,2,1]  of 
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TSTURM 


B/III 


n 

time  to 

time  to 

time  to 

compute 

compute 

n 

O 

compute 

n 

O 

eigen- 

eigen- 

eigen- 

values 

vectors 

vectors 

(seconds) 

(seconds) 

(seconds) 

32 

1.1 

0.3 

4.15d-15 

4.00d-13 

0.4 

1.30d-16 

4.27d-15 

100 

11.3 

2.0 

2.46d-14 

8.48d-12 

3.2 

1.56d-16 

3.15d-14 

512 

276.7 

72.2 

1.26d-13 

4.48d-U 

125.8 

4.11d-16 

1.78d-13 

Table  3:  Times,  residuals,  and  orthogonalities  for  eigensystems  computed  by 
TSTURM  and  by  B/III  for  matrix  [1,2, 1]. 


order  n  =  512,  however,  very  little  orthogonalization  of  eigenvectors  takes  place 
(see  Table  6),  and  eigenvector  computation  is  cheap  compared  to  eigenvalue 
computation.  The  longer  time  of  algorithm  III  represents  only  a  13%  increase 
in  total  computation  time  for  B/III  over  TSTURM. 

The  storage  requirements  for  algorithm  III  are  the  same  as  for  TINVIT. 
The  time  for  generation  of  random  starting  vectors  in  algorithm  III  is  small 
compared  to  the  total  computation  time.  It  constitutes  less  than  4%  of  the 
total  eigenvector  computation  time  for  matrix  [1,2,1]  of  order  n  <  512  and 
for  W+  of  order  n  <  525.  A  1000  x  1000  matrix  of  random  elements  can  be 
generated  in  14.00  seconds. 

6  Comparison  with  Other  Methods 


This  section  offers  an  experimental  comparison  of  Cuppen’s  divide  and  conquer 
method,  the  QL  method,  and  bisection  with  inverse  iteration.  The  respective 
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TSTURM 

B/III 

n 

t’me  to 
compute 
eigen¬ 
values 
(seconds) 

time  to 
compute 
eigen¬ 
vectors 
(seconds) 

n 

O 

time  to 
compute 
eigen¬ 
vectors 
(seconds) 

n 

O 

42 

1.6 

0.4 

4.25d-15 

2.3d-13 

0.6 

1.61d-16 

2.61d-15 

105 

4.72 

3.0 

5. 1  Id- 14 

2.36d-12 

4.8 

6.98d-16 

4.43d-15 

525 

23.3 

171.1 

1.14d-13 

4.08d-ll 

333.4 

5.55d-15 

1.69d-14 

Table  9:  Times,  residuals,  and  "Crthogonalities  for  eigensystems  computed  by 
TSTURM  and  by  B/III  for  matrix  W+ . 


implementations  are  TREEQL  [10],  TQL2  [19],  and  B/III.  TREEQL  switches 
from  divide  and  conquer  to  TQM»for  subproblems  of  order  n  <  50.  For  a 
given  problem,  the  relative  speeds  of  the  three  methods  depend  on  the  degree 
of  deflation  in  TREEQL,  the  amount  of  matrix  splitting  in  TQL2,  and  the 
clustering  of  eigenvalues  for  B/III.  Because  they  illustrate  the  range  of  results 
for  all  matrices  from  [14],  we  use  the  three  test  matrices  [1,2,1],  W+ ,  and  [l,u,l] 
as  a  basis  for  comparison.  The  matrix  [l,u,l]  has  ones  in  its  first  subdiagonal  and 
superdiagonal  and  the  value  i  x  10-6  in  the  ith  diagonal  position.  It  undergoes 
little  deflation  when  its  eigendecomposition  is  computed  by  TREEQL.  As  none 
of  these  test  matrices  contains  row  sums  of  widely  differing  magnitudes,  we 
exclude  IMTQL2  from  the  comparison:  the  performance  and  accuracy  of  TQL2 
and  IMTQL2  are  nearly  identical  for  these  test  matrices  [14]. 

Table  10  shows  that  the  maximal  residual  TZ  —  max1<«n||  Tin  —  A<ti,  j|2 
and  deviation  from  orthogonality  O  =  ||  UTU  —  I  [|co  of  the  eigendecomposi- 
tions  computed  by  the  three  methods  do  not  differ  significantly. 


29 


TREEQL 

TQL2 

B/III 

3.26d-15 

1.52d-15 

1.80d-16 

5.59d-15 

1.30d-14 

6.20d-15 

n  =  100  or  105 

— 

TREEQL 

TQL2 

B/III 

6.07d-14 

2.39d-15 

3.67d-15 

2.75d-15 

1.06d-14 

8.52d-14 

n  =  512  or  525 

TREEQL 

TQL2 

B/III 

3.96d-15 

1.66d-14 

6.05d-15 

1.67d-13 

2.50d-13 

7.92d-13 

Table  10:  Maximal  residual  and  orthogonality  of  eigendecompositions  computed 
by  B /III,  TREEQL,  and  TQL2  for  matrices  [1,2,1],  Wf,  [l,u,l]. 
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As  shown  in  Figures  1-3,  however,  the  computation  times  for  the  problems 
can  differ  significantly.  The  top  graphs  in  these  figures  show  the  different  com¬ 
putation  times  for  matrix  orders  n  <  60.  B/III  is  slowest  for  matrices  [1,2,1]  and 
[l,u,l]  of  order  n  <  20.  TQL2  is  fastest  for  [1,2,1]  and  [l,u,l]  of  order  n  <  20 
and  slowest  for  all  matrices  of  order  50  <  n  <  525.  TREEQL  is  fastest  for  W+ 
of  all  orders  and  for  [1,2,1]  of  ordqfc20  <  n  <  60  due  to  moderate  ([1,2,1])  and 
heavy  (W+)  deflation.  The  bottom  graphs  in  Figures  1-3  show  the  different 
computation  times  for  matrix  ordfts  60  <  n  <  512.  For  n  =  512,  TREEQL 
is  about  2  to  40  times  faster  than  TQL2,  and  B/III  is  about  eight  times  faster 
than  TQL2. 

Because  the  degree  of  deflation  and  the  grouping  of  eigenvalues  are  rarely 
known  in  advance,  it  is  generally  not  possible  to  select  the  fastest  serial  method 
for  a  given  problem.  In  all  of  our  experiments,  however,  B/III  is  much  faster 
than  TQL2  and  equally  accurate.  For  larger  matrix  orders,  B/III  is  fastest  for 
light  to  medium  deflation,  while  TREEQL  is  fastest  for  heavy  deflation. 

7  A  Statistical  Analysis  of  Inverse  Iteration 

The  preceding  sections  experimentally  establish  the  design  choices  for  an  ac¬ 
curate  implementation  of  inverse  iteration.  Because  they  rely  on  the  use  of 
starting  vectors  with  randomly  distributed  components,  we  now  give  a  statisti¬ 
cal  analysis  to  explain  some  of  the  experimental  observations.  We  proceed  as 
follows.  Section  7.1  states  our  assumptions;  Section  7.2  defines  a  good  eigen¬ 
vector  approximation;  Section  7.3  determines  the  expected  quality  of  a  random 
starting  vector  and  briefly  discusses  the  limitations  of  the  analysis,  and  Section 
7.4  estimates  the  error  in  applying  the  analysis  based  on  starting  vectors  with 
normally  distributed  components  to  starting  vectors  with  uniformly  distributed 
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Figure  1:  Times  for  TQL2,  TREEQL,  and  B/III  versus  matrix  order  for  matrix 


0 


100 


500 


200  300  4CQ 

motrix  order 
glued  WllKlnson  matrix 

Figure  2:  Times  for  TQL2,  TREEQL,  and  B/III  versus  matrix  order  for  the 
glued  Wilkinson  matrix  W+ . 
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components.  Statistical  analyses  of  methods  for  computing  eigen  values  can  be 
found,  for  example,  in  [9,  16]. 


7.1  Assumptions 

A  unit-norm  starting  vector  y  —  x/\\  x  ||2  is  computed  from  a  vector  x  with 
independent  random  components,  each  of  which  has  a  normal  distribution  with 
mean  0  and  variance  1  (normal  (0,1)).  Such  vectors  y  are  uniformly  distributed 
on  the  unit  n-sphere  [8].  Because  the  distribution  of  their  components  is  in¬ 
variant  under  orthogonal  transformations,  these  vectors  can  be  represented  in 
terms  of  the  orthonormal  basis  of  eigenvectors  (ui,  U2, . . . ,  u„}  of  the  symmetric 
tridiagonal  matrix  T: 

n  n 

y=(Vi,V2,  -,Vnf,  where  y=^%u<,  ||  u<  ||2  =  1,  ^rj,2  =  l. 

«=i  «=i 

7.2  The  Quality  of  an  Approximate  Eigenvector 

A  vector  y  as  defined  above  is  a  good  approximation  to  u,-  if  ru  is  much  larger 
than  any  other  component,  i.  e.  if  tj?  >  1  —  e2  for  some  error  tolerance  0  < 
€  <  l.  Geometrically,  the  angle  #,■  between  y  and  ut  satisfies  cos 0,  >  Vi  — e2 . 
Similarly,  y  is  a  good  approximation  to  a  linear  combination  of  eigenvectors  uj, 
. . .,  Ud  if  Vi  >  1  —  c2.  Because  random  vectors  are  uniformly  distributed 
on  the  sphere,  the  probability  that  y  is  a  good  approximation  to  the  linear 
combination  is  just  the  fraction  of  the  surface  area  of  the  sphere  defined  by  all 
vectors  whose  components  £i ,...,£<*  satisfy  ^  1  -  £2- 

The  probability  that  J2i=i  Vi  >  1  —  e2  is  determined  by  integrating  the 
probability  density  function  of  the  sum  ]Ti=i  Vi  between  1  —  t2  and  1.  If  the 
component  £,  has  a  normal  (0,1)  distribution,  then  vl  =  £?/£>= ^ias  a 
B( 5,  ^yl)  distribution,  and  the  sum  J2i=i  has  a  #(f>  !if^)  distribution  [8] 
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with  probability  density  function  t“i(l  —  t)^ .  The  probability  that  y  is  a 
good  approximation  to  a  linear  combination  of  eigenvectors  «i, . . . ,  is  thus 
given  in  the  following  theorem: 


Theorem  7.1  Let  x  have  independent  random  components,  each  with  a  normal 
(0,1)  distribution,  and  let  y  =  x/\\  x  |jj  =  (rji, . .  .,r]n)T.  Gwen  0  <  e  <  1,  the 
probability  that  r?2  >  1  —  e1  is 


,1-e2 

>  1  -  <2)  >  1  -a  I  <’~l(l-0 

i  =  l 


'  dt  =  1  -  oJ 


(1) 


with  a  = 


7(3) 


7.3  The  Quality  of  the  Starting  Vectors 


The  experiments  in  Section  4  show  that  random  vectors  make  good  starting 

vectors  for  inverse  iteration.  It  turns  out  that  the  random  starting  vectors 

used  were  linearly  independent  and  not  orthogonal  to  the  eigenvectors  being 

computed.  In  this  section,  we  demonstrate  the  practical  usefulness  of  Theorem 

7.1  and  establish  the  number  of  times  a  random  starting  vector  can  be  reused 

for  computing  eigenvectors  associated  with  well-separated  eigenvalues. 

For  rapid  convergence  of  an  iterate  to  an  eigenvector  u,-,  it  is  essential  that 

the  starting  vector  y  have  a  large  enough  component  in  the  u *  direction.  The 

_ 

probability  that  a  component  is  of  size  at  least  \/l  —  e2  is  given  in  Theorem 
7.1  with  d  =  1.  Table  11  gives  tlifise  probabilities  for  matrix  orders  n  =  100, 
1000,  and  10000  for  a  range  of  1  —  f2  values.  The  integral  in  Theorem  7.1  was 
computed  by  Gauss-Legendre  quadrature  with  100  nodes.  For  n  <  10000,  the 
probability  that  any  one  component  of  y  is  at  least  .0001  is  no  smaller  than 
1  -  10~16,  while  the  probability  that  any  one  component  is  at  least  .001  is  no 
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|g 

for  n  =  1000 

for  n  =  10000 

PC!*!  >  vT^T?) 

P(\m l>  vT-<2) 

Q£|| 

1 

■ 

■ 

1  wBM 

1.00  (16) 

0.99 

0.92 

0.34 

0  (16) 

■ 

1 

1.00  (16) 

0.90 

0.34 

0  (16) 

0  (16) 

Table  11:  Lower  bounds  on  the  probability  that  rjf  >  1  —  e2.  Numbers  in 
parentheses  indicate  the  number  of  zero  decimal  places. 


smaller  than  0.9.  The  unit  two-norm  of  the  starting  vector  guarantees  that  not 
all  components  are  very  small.  Recall  that  a  component  of  size  10-a  is  suffi¬ 
cient  for  fast  convergence.  For  a  given  toleranc0  1  —  e2,  the  probability  bounds 
decrease  as  n  increases:  as  the  number  of  components  in  a  vector  increases,  the 
probability  that  any  one  component  is  large  decreases. 

Table  4  shows  that  sets  of  n  randomly  generated  starting  n-  vectors  tend  to  be 
linearly  independent.  We  call  a  set  of  vectors  {x1( . .  ( numerically )  linearly 

dependent  up  to  a  tolerance  €  >  0  if  there  exists  a  set  of  nonzero  coefficients 

Ori,  02 . On  such  that  l  Qi  =  1  and  II  2  II2  =  ||  Yli= 1  II2  <  f.  The 

following  theorem  gives  an  upper  bound  on  the  probability  that  xi,.  .,xn  are 
linearly  dependent  up  to  tolerance  e. 

Theorem  7.2  (Linear  Independence  of  Starting  Vectors)  Assume  that  the 
vectors  x,  have  independent  random  components  with  normal  (0,1)  distributions 
and  that  z  =  X2<=1  aixi  with  a*  =  T  Given  e  >  0,  the  probability  that 

II  x  1 1 2  <  e  is  bounded  above  by 

P(\\z  ||2  <  e,  < 

Proof:  Let  x<  =  (fij.&i,  •  ■  ■  ,  £ni)r,  !<*'<«,  where  each  component  has  a 
normal  (0,1)  distribution.  If  :  =  a>x<  =  (C1.C2,  •  •  Xn)T  with  £"=1  a ?  = 


37 


1  then  the  component  £  =  ^^7=1  has  a  normal  (0,1)  distribution  and 

probability  density  function  f(t)  =  This  gives  an  upper  bound  on 

P(  II  *  Hoc  <  &. 

n 

P(\\  *  Hoc  <  o  =  P{\ C.l  <  «,  1  <  i  <  n)  <  Z  P(IC<I  <  0  =  nPflCil  <  «)• 

i  =  1 

Because 

P(ICil  <  0  =  /*(-«  <  Ci  <  «)  <  P( Ci  <  0  -  P«i  <  -<) 

we  have 

~P(\\  z  ||oo  <  0  <  f  /(*)<**  -  [  ‘  f(*)dx  =  2  f  f(x)dz  < 

^  */  —  oo  J  —  oo  JO  V27T 

Because  ||  z  ^  <  ||  z  ||2,  P(||  z  ||oo  <  «)  >  P(ll  *  lb  <  f),  and 

p(IMU<0<-t=- 

■ 


For  the  computation  of  eigenvectors  •’ssociated  with  well-separated  eigenval¬ 
ues,  linear  independence  of  the  starting  vectors  seems  less  important,  and  one 
could  try  to  use  the  same  random  starting  vector  for  all  of  them.  Given  e  >  0, 
the  same  starting  vector  y  can  be  used  to  compute  eigenvectors  ui,  . . .,  Ud  if 
qf  >  1  —  e~  for  1  <  i  <  d.  The  following  theorem  shows  how  the  number  d  of 
eigenvectors  for  which  y  can  be  used,  depends  on  the  probability  p  with  which 
each  of  the  d  eigenvectors  provides  a  significant  contribution  of  y. 

Theorem  7.3  (Reuse  of  Starting  Vectors)  Assume  that  x  has  independent 
random  components  with  normal  (0,1)  distributions  and  that  y  =  x/||  x  ||2.  If 
d  <  \}jd£- ),  then  r}(  >  1  -  e2  for  1  <  i  <  d  with  probability  at  least  p. 
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Proof:  Let  y  =  x/\\  x  ||2  =  (m,  ■  ■  ■ .  ln)T  ■  Then 


P(r)?  >  1  -  e2, 1  <  i  <  d)  -  1  -  P{rfi  <  1  —  c2, 1  <i<d) 

d 

>  1-^  P0K  <  1  ~  f2)  =  1  -  dal, 

i= l 

where  the  last  equality  comes  from  Theorem  7.1.  Setting  p  =  1  —  dal  gives 


d<iL: 


Table  12  shows  values  of  d  for  several  choices  of  V"l  —  c2  when  n  =  100,  1000, 
10000:  the  number  of  times  y  can  be  reused  decreases  with  increasing  matrix 
order  n,  for  fixed  p  and  e.  This  echoes  the  trend  observed  in  Table  11:  a  long 
vector  of  norm  one  is  less  likely  to  have  large  components  and  so  is  less  acceptable 
for  reuse.  From  the  numerical  experiments,  we  know  that  a  component  of  size 
10-8  suffices  for  rapid  convergence.  According  to  Table  12  all  components  are 
of  size  10"4  for  random  vectors  up  to  length  1000  with  probability  0.99  so  that 
the  same  starting  vector  can  be  used  to  compute  all  eigenvectors  for  matrices 
of  order  up  to  n  =  10000  with  probability  0.99. 

Unfortunately,  the  applicability  of  our  analysis  to  the  results  of  inverse  it¬ 
eration  is  extremely  limited.  If  z  =  (T  —  Xjl)~ky  =  (0, . .  .,Cn)T  >s  the  unnor¬ 
malized  Jfcth  iterate,  and  no  reorthogonalization  has  taken  place,  the  probability 
that  any  one  component  is  larger  in  magnitude  than  Vl  —  ^  is  [14] 


Cj  >  1  ~  <2)  >  1  ~  a  / 

Jo 


“  £2) 


According  to  the  mean-value  theorem  for  definite  integrals,  the  integral  1  is 
bounded  above  by  (A ,  -  Aj)2l(l  -  e2).  Thus,  if  A,  is  a  good  approximation  to 
Aj,  then  1  is  small  and  the  probability  that  zj  approximates  Uj  is  close  to  one. 
If  A j  -  A j  =  A;  +  1  —  A j,  then  Zj  approximates  xij  and  Uj+i  with  equal  probability. 
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for  n  =  100  for  n  =  1000  for  n  =  10000 
d  d  d 


5 

<  1 
<  1 


10000 

1 

<  1 
<  1 


Table  12:  The  number  of  times  d  a  starting  vector  can  be  used,  when  rj?  >  1  —  e2 
for  1  <  t  <  d  with  probability  p. 

When  Xj  and  AJ  +  1  are  close  but  not  equal,  one  of  the  two  eigenvectors  Uj  and 
Uj  + 1  will  be  approximated  better  than  the  other  only  if  (1  -  e2)(A;  -  A,)  and 
(1  -  c2)(Aj+i  -  A;)  lie  where  the  integrand  f(t)  —  t~i(l  —  t )s7~  has  a  large 
derivative.  Hence,  although  the  effects  of  additional  iterations  on  the  accuracy 
can  be  assessed  in  terms  of  the  integral  I,  a  qualitative  interpretation  seems 
difficult. 

Just  as  the  statistical  analysis  falls  short  in  determining  the  preferred  number 
of  iterations,  it  fails  regarding  stopping  and  reorthogonalization  criteria:  even 
the  simplest  approximations  of  distributions  become  unwieldy  [3,  14,  15],  and 
probability  density  functions  become  dependent  on  the  exact  eigenvalues  and  on 
other  assumptions  that  are  difficult  to  verify  [14].  Therefore,  we  did  not  extend 
the  statistical  analysis  to  the  iterates. 

7.4  Practical  Considerations 

The  preceding  statistical  analysis  qualitatively  confirms  the  experimental  ob¬ 
servations  regarding  the  starting  vectors  in  Section  4.  However,  the  analysis  is 
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lor  n  —  1000 

Vl  -  e2 

EflSSEu 

^(kl  >  VI  -  e2) 

EH3E5iil 

1 

1.00  (16) 

1.00  (16) 

0.99 

0.92 

0.34 

0  (16) 

1.00  (16) 

0.99 

0.36 

0.33 

0(2) 

0  (16) 

1.00  (16) 

0.90 

0.34 

0(16) 

0(16) 

0(16) 

Table  13:  Lower  bounds  on  the  probability  that  rjf  >  1  —  c2.  Numbers  in 
parentheses  indicate  the  number  of  zero  decimal  places. 


based  on  starting  vectors  with  independent,  normally  distributed  components, 
while  the  experiments  were  performed  with  uniformly  distributed  components 
in  [-1,1]  having  some  degree  of  dependence.  Thus,  the  experimental  starting  vec¬ 
tors  of  length  n  are  not  uniformly  distributed  on  the  unit  n-sphere  but  rather  on 
an  n-cube  of  height  2.  Although  normally  distributed  pseudorandom  numbers 
can  be  generated  at  a  higher  cost  than  uniform  ones  [8],  we  will  now  show  that 
uniform  random  numbers  are  acceptable  substitutes. 

The  error  in  applying  the  analysis  to  uniformly  distributed  starting  vectors 
may  be  estimated  by  circumscribing  an  n-sphere  of  radius  y/n  about  the  hy¬ 
percube.  A  vector  y  =  (r?i.  •  •  •.  77n)r  =  f7iu,  on  this  sphere  is  a  good 

approximation  to  a  multiple  y'riuj  of  the  eigenvector  u,-  if  |r/j|  >  >/n(  1  —  e2). 
Following  the  derivation  in  Section  7.2,  the  probability  of  this  occurrence  is 

^(n,2  s!  n(l  -  e2))  =  1  -  a  /  <”^(1  -  t)^dt. 

Jo 

Lower  bounds  for  this  probability  computed  by  Gauss- Legendre  quadrature  are 
given  in  Table  13. 

The  probability  of  a  large  component  ru  in  the  uniform  case  is  not  as  high  as 
in  the  normally  distributed  case,  but  components  with  magnitude  10-6  can  be 
expected  with  near  certainty,  and  this  value  still  suffices  for  fast  convergence. 
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The  linear  independence  of  the  pseudorandom  starting  vectors  is  demonstrated 

in  Table  4. 
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